Bachelor in Data Science and Engineering
Bayesian Data Analysis
The objective of this project is to understand how to use the EM algorithm, Gibbs sampling and Variational Bayes. In order to do that, I would like to apply those concepts into a practical case study.
In order to create the Series.csv dataset, I have made a google form to distribute it to some people of different ages, the more varied the surveyed people, the more interesting results I will get. The questions that were ask on the form were the following:
(All the survey was originally sent in spanish, this is why the answers on the dataset are in that language)
For this section, we are going to visualize the dataset and get a better understanding of what we have.
rm(list=ls())
setwd("D:/Documentos/Universidad/Tercero/Bayesian Data Analysis/datasets")
data <- read.csv2("Series.csv",sep = ",",dec = ".")
names(data)
## [1] "Time_Stamp" "Age" "Gender"
## [4] "x0.Recent" "x1.Subscription" "x2.Shared"
## [7] "x3.Pirate" "x4.in.company" "x5.meet.friends"
## [10] "x6.day.hrs" "x7.week.hrs" "x8.trends"
## [13] "x9.series.vs.films" "x10.cinema" "x11.smartphone"
## [16] "x12.recommend" "x13.be.recommend" "x14.late"
## [19] "x15.procrastinate" "x16.non.stop" "x17.too.much"
## [22] "x18.addict"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 8.00 14.88 19.50 100.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 2.000 3.255 4.000 20.000
par(mfrow = c(1,2))
pie(table(X["Age"]),main=colnames(X)["Age"],radius=1)
pie(table(X["Gender"]),main=colnames(X)["Gender"],radius=1)
par(mfrow = c(3,3))
for (i in 3:21){
if(sapply(X[i], class)[1] == "character"){
pie(table(X[,i]),main=colnames(X)[i],radius=1)
}
}
aux1 = X[which(X$Age == "50-70"),]
aux2 = X[which(X$Age == "70+"),]
Y = rbind(aux1,aux2) #Dataset with age between 50-100
aux1 = X[which(X$Age == "1-15"),]
aux2 = X[which(X$Age == "16-21"),]
aux3 = X[which(X$Age == "22-30"),]
aux4 = X[which(X$Age == "31-50"),]
Z = rbind(aux1,aux2,aux3,aux4) #Dataset with age between 1-49
## [1] "Instances of > 50 dataset: 96"
## [1] "Instances of < 50 dataset: 106"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.00 6.00 11.23 12.00 100.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 10.00 18.19 20.00 100.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 1.000 2.536 3.000 20.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 2.000 3.906 5.000 15.000
## [1] "Hours per week mean of 50-70 dataset: 11.6373626373626"
## [1] "Hours per week mean of 16-21 dataset: 21.5769230769231"
For this part, we want to obtain K groups of profiles according to their habits watching series.
For each observation, x=(x1,…,xM), we may assume a K-component mixture of multivariate binary variables with probability distribution:
where and K=1,…,K and for m=1,…,M.
We assume the following priors:
In our case:
The main idea is to define a latent set of variables, z={z1,…,zN}, indicating the group of each respondent. The prior probability that i-th respondent belongs to group k is:
and, given that the respondent is in group k:
Then, the complete-data likelihood function is:
As the model assumptions said,we want to have 19 binary variables, for doing so, we have just to set some rules to variables x6 and x7 along with ignoring Time_stamp, Age and Gender variables. As we have seen the gender variable has no much to do and the age is important but not for the clustering part, we will resume the age topic when commenting the results.
X = data[4:22] #Ignore Time_stamp,Age and Gender
X<-ifelse(X=="Si",1,0) #Binarizing the variables
X = as.data.frame(X)
X$x6.day.hrs = data$x6.day.hrs #Recovering the x6 and x7 columns from the original dataset
X$x7.week.hrs = data$x7.week.hrs
hours_week = 24*7
sleep_time = 8 * 7 #Assuming 8 hours as the mean of hours sleeping
work_time = 8 * 5 #Assuming full time jobs of 8 hours per day with free weekends
eating_time = 2 * 7 #Assuming 2 hours a day spend in having lunch, having dinner, etc.
free_time = hours_week - sleep_time - work_time - eating_time #58 hours a week of free time
print(paste0("Average free time per week: ",free_time))
## [1] "Average free time per week: 58"
Given that in average we have 58 hours of free time per week (young people should have more), and given the previously commented density plots (majority of density between 0 and 25) of the hours per week, I have decided that 30 hours a week dedicated to watch series will be the threshold to input 1 on the variable x7, otherwise, the inputted value will be 0.
If we divide this 30 hours by 7, we obtain around 4.28 hours a day so I am going to input a 1 if the respondent has answer more than 4.5 and 0 otherwise.
X$x6.day.hrs = ifelse(X$x6.day.hrs >= 4.5,1,0)
X$x7.week.hrs = ifelse(X$x7.week.hrs >= 30,1,0)
In order to implement this LCA, we are going to use the BayesLCA package
library(BayesLCA)
## Loading required package: e1071
## Loading required package: coda
set.seed(100)
First we are going to apply the algorithm with the default parameters but with k=5 (for this library the K is renamed as G) (delta=beta=alpha=1, restarts=5, iterations=500)
fit.EM=blca(X, G=3, method = "em")
## Restart number 1, logpost = -1726.69...
## New maximum found... Restart number 2, logpost = -1726.61...
## Restart number 3, logpost = -1726.7...
## Restart number 4, logpost = -1726.7...
## Restart number 5, logpost = -1726.68...
As the EM algorithm is susceptible to converge into a local maximum or saddle point, we are going to increase the restarts from the default 5 to 15
fit.EM=blca(X, G=3, method = "em", restarts = 15)
## Restart number 1, logpost = -1726.62...
## Restart number 2, logpost = -1726.69...
## Restart number 3, logpost = -1726.69...
## Restart number 4, logpost = -1743.81...
## New maximum found... Restart number 5, logpost = -1726.62...
## Restart number 6, logpost = -1744.18...
## Restart number 7, logpost = -1729.76...
## Restart number 8, logpost = -1726.68...
## Restart number 9, logpost = -1744.18...
## Restart number 10, logpost = -1726.62...
## Restart number 11, logpost = -1726.69...
## Restart number 12, logpost = -1726.7...
## New maximum found... Restart number 13, logpost = -1726.62...
## Restart number 14, logpost = -1726.69...
## Restart number 15, logpost = -1726.7...
We can check the MAP estimators of the parameters of the model:
print(fit.EM)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.874 0.993 0.792 0.100 0.806
## Group 2 0.938 0.933 0.818 0.721 0.346
## Group 3 0.364 0.311 0.079 0.145 0.480
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.421 0.232 0.109 0.440 0.459
## Group 2 0.739 0.368 0.276 0.681 0.627
## Group 3 0.150 0.000 0.000 0.102 0.202
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.041 0.147 0.946 1.000 0.166
## Group 2 0.082 0.316 1.000 1.000 0.632
## Group 3 0.091 0.093 0.567 0.923 0.140
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.082 0.247 0.075 0.021
## Group 2 0.668 0.668 0.159 0.266
## Group 3 0.000 0.110 0.105 0.026
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.473 0.335 0.193
## Warning: Posterior standard deviations not returned.
Also, we can check the prior parameters:
summary(fit.EM)
## __________________
##
## Bayes-LCA
## Diagnostic Summary
## __________________
##
## Hyper-Parameters:
##
## Item Probabilities:
##
## alpha:
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 1 1 1 1
## Group 2 1 1 1 1
## Group 3 1 1 1 1
##
## beta:
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 1 1 1 1 1
## Group 2 1 1 1 1 1
## Group 3 1 1 1 1 1
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 1 1 1 1
## Group 2 1 1 1 1
## Group 3 1 1 1 1
##
## Class Probabilities:
##
## delta:
## Group 1 Group 2 Group 3
## 1 1 1
## __________________
##
## Method: EM algorithm
##
## Number of iterations: 19
##
## Log-Posterior Increase at Convergence: 0.001168029
##
## Log-Posterior: -1726.615
##
## AIC: -3569.844
##
## BIC: -3765.032
par(mfrow=c(1,1))
plot(fit.EM)
Now we are going to try with different priors
fit.EM.prior2=blca.em(X, G=3,restarts=15,alpha=2,beta=2, verbose = FALSE)
print(fit.EM.prior2)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.870 0.977 0.779 0.124 0.786
## Group 2 0.924 0.919 0.812 0.735 0.340
## Group 3 0.356 0.311 0.095 0.169 0.466
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.420 0.230 0.113 0.452 0.460
## Group 2 0.756 0.387 0.297 0.670 0.632
## Group 3 0.168 0.030 0.029 0.124 0.213
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.054 0.155 0.942 0.990 0.184
## Group 2 0.093 0.332 0.981 0.985 0.643
## Group 3 0.112 0.113 0.553 0.899 0.161
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.110 0.259 0.087 0.031
## Group 2 0.671 0.680 0.169 0.289
## Group 3 0.031 0.133 0.127 0.053
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.503 0.310 0.187
## Warning: Posterior standard deviations not returned.
fit.EM.prior0=blca.em(X, G=3,restarts=15,alpha=0.001,beta=0.001,verbose=FALSE)
print(fit.EM.prior0)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.761 0.828 0.617 0.164 0.720
## Group 2 1.000 1.000 0.882 0.718 0.362
## Group 3 0.000 0.000 0.000 0.000 0.000
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.365 0.147 0.076 0.384 0.388
## Group 2 0.808 0.454 0.303 0.690 0.682
## Group 3 0.000 0.000 0.000 0.000 0.155
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.089 0.156 0.889 0.985 0.218
## Group 2 0.000 0.288 1.000 1.000 0.587
## Group 3 0.000 0.000 0.150 1.000 0.000
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.098 0.261 0.099 0.040
## Group 2 0.680 0.647 0.120 0.246
## Group 3 0.000 0.000 0.000 0.000
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.676 0.285 0.039
## Warning: Posterior standard deviations not returned.
par(mfrow=c(2,2))
plot(fit.EM,main="Alpha=1, Beta=1")
plot(fit.EM.prior2, main="Alpha=2, Beta=2")
plot(fit.EM.prior0,main="Alpha=0.001, Beta=0.001")
We can see that the one with alpha=2, beta=2 prior is practically identical to the uniform prior, on the other hand, the “improper” non informative prior is completely off.
Now, we can try to set different K values and use the BIC criteria to see which one has a better mixture size:
fit.EM.k.2=blca(X, 2, method = "em", restarts=15, verbose=FALSE)
fit.EM.k.4=blca(X, 4, method = "em", restarts=15, verbose=FALSE)
print(fit.EM.k.2)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.939 0.966 0.846 0.425 0.561
## Group 2 0.521 0.601 0.309 0.107 0.643
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.618 0.353 0.218 0.604 0.600
## Group 2 0.197 0.000 0.000 0.167 0.204
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.066 0.259 0.985 1.000 0.429
## Group 2 0.061 0.066 0.709 0.956 0.099
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.398 0.468 0.117 0.149
## Group 2 0.000 0.154 0.094 0.016
##
## Membership Probabilities:
##
## Group 1 Group 2
## 0.66 0.34
## Warning: Posterior standard deviations not returned.
print(fit.EM.k.4)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.933 1.000 0.859 0.137 0.784
## Group 2 0.423 0.461 0.166 0.135 0.579
## Group 3 0.924 0.888 0.855 0.723 0.094
## Group 4 0.905 1.000 0.732 0.677 0.658
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.483 0.203 0.058 0.489 0.494
## Group 2 0.147 0.000 0.000 0.156 0.203
## Group 3 0.676 0.096 0.088 0.706 0.477
## Group 4 0.796 0.945 0.752 0.580 0.851
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.061 0.154 1.000 1.000 0.243
## Group 2 0.074 0.078 0.613 0.942 0.120
## Group 3 0.109 0.516 1.000 1.000 0.673
## Group 4 0.000 0.102 0.928 1.000 0.441
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.158 0.220 0.076 0.014
## Group 2 0.000 0.161 0.088 0.022
## Group 3 0.676 0.828 0.296 0.196
## Group 4 0.530 0.556 0.000 0.417
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3 Group 4
## 0.426 0.256 0.182 0.137
## Warning: Posterior standard deviations not returned.
par(mfrow=c(1,2))
plot(fit.EM.k.2,main="K = 2")
plot(fit.EM.k.4, main="K = 4")
-fit.EM$BIC
## [1] 3765.032
-fit.EM.k.2$BIC
## [1] 3778.596
-fit.EM.k.4$BIC
## [1] 3788.674
Note that this library is calculating the BIC as
and not as the classical
According to this criteria, the lowest value is the one from K=3. At last, given the estimated model parameters, we can get the predictive probability class given a observation.
Xnew = X[51,]
Zscore(Xnew,fit=fit.EM)
## [,1] [,2] [,3]
## 1100100011001100101 0.8150426 0.1768332 0.008124268
With the previous algorithm, we could only obtain the MAP values, with Gibbs sampling we want to obtain also the whole posterior distribution.
Gibbs sampling is a particular MCMC method when the conditional posterior distributions are known.
In order to obtain a sample from the joint distribution, we can sample iteratively from the conditional posterior distributions:
This Gibbs sampling algorithm is easy to implement since the conditional posterior distributions have a closed form. It is said that we have chosen semiconjugate priors.
Now, we can apply the Gibbs sampling algorithm for the series data. We initially use k=3 and default priors:
fit.GS=blca(X, 3, method = "gibbs")
## Initialising sampler...starting burn-in.
## Burn-in completed...
## 1000 of 5000 samples completed...
## 2000 of 5000 samples completed...
## 3000 of 5000 samples completed...
## 4000 of 5000 samples completed...
## 5000 of 5000 samples completed...
## Sampling run completed.
print(fit.GS)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.870 0.969 0.775 0.146 0.775
## Group 2 0.920 0.917 0.814 0.729 0.321
## Group 3 0.358 0.311 0.104 0.169 0.468
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.429 0.232 0.113 0.457 0.463
## Group 2 0.767 0.392 0.305 0.666 0.635
## Group 3 0.163 0.033 0.031 0.120 0.218
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.055 0.159 0.943 0.991 0.197
## Group 2 0.095 0.344 0.979 0.983 0.657
## Group 3 0.110 0.108 0.546 0.896 0.157
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.126 0.266 0.088 0.032
## Group 2 0.672 0.689 0.176 0.308
## Group 3 0.033 0.139 0.129 0.056
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.514 0.295 0.190
##
## Posterior Standard Deviation Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.050 0.025 0.062 0.060 0.050
## Group 2 0.039 0.040 0.057 0.074 0.093
## Group 3 0.090 0.129 0.066 0.066 0.104
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.064 0.053 0.038 0.064 0.063
## Group 2 0.072 0.071 0.068 0.069 0.070
## Group 3 0.068 0.031 0.031 0.067 0.072
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.025 0.039 0.028 0.010 0.051
## Group 2 0.041 0.075 0.021 0.017 0.086
## Group 3 0.053 0.055 0.102 0.053 0.065
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.055 0.054 0.031 0.021
## Group 2 0.080 0.079 0.056 0.080
## Group 3 0.031 0.062 0.058 0.040
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.058 0.063 0.048
Before seeing some results plots, we are going to check if the algorithm is performing well. For doing that, we are going to perform a convergence diagnosis.
par(mfrow = c(3, 3))
plot(fit.GS, which = 5)
raftery.diag(as.mcmc(fit.GS))
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## Burn-in Total Lower bound Dependence
## (M) (N) (Nmin) factor (I)
## ClassProb 1 8 10734 3746 2.870
## ClassProb 2 30 35040 3746 9.350
## ClassProb 3 8 8420 3746 2.250
## ItemProb 1 1 4 5211 3746 1.390
## ItemProb 1 2 12 19137 3746 5.110
## ItemProb 1 3 8 10116 3746 2.700
## ItemProb 1 4 6 8930 3746 2.380
## ItemProb 1 5 3 4198 3746 1.120
## ItemProb 1 6 3 4267 3746 1.140
## ItemProb 1 7 3 4267 3746 1.140
## ItemProb 1 8 5 5673 3746 1.510
## ItemProb 1 9 4 4994 3746 1.330
## ItemProb 1 10 3 4484 3746 1.200
## ItemProb 1 11 4 5211 3746 1.390
## ItemProb 1 12 3 4484 3746 1.200
## ItemProb 1 13 3 4410 3746 1.180
## ItemProb 1 14 2 3741 3746 0.999
## ItemProb 1 15 4 4955 3746 1.320
## ItemProb 1 16 8 9056 3746 2.420
## ItemProb 1 17 4 4728 3746 1.260
## ItemProb 1 18 4 4713 3746 1.260
## ItemProb 1 19 4 5124 3746 1.370
## ItemProb 2 1 3 4129 3746 1.100
## ItemProb 2 2 3 4410 3746 1.180
## ItemProb 2 3 3 4198 3746 1.120
## ItemProb 2 4 3 4484 3746 1.200
## ItemProb 2 5 22 21906 3746 5.850
## ItemProb 2 6 3 4410 3746 1.180
## ItemProb 2 7 3 4267 3746 1.140
## ItemProb 2 8 2 3741 3746 0.999
## ItemProb 2 9 3 4267 3746 1.140
## ItemProb 2 10 3 4129 3746 1.100
## ItemProb 2 11 3 4129 3746 1.100
## ItemProb 2 12 3 4062 3746 1.080
## ItemProb 2 13 2 3995 3746 1.070
## ItemProb 2 14 3 4062 3746 1.080
## ItemProb 2 15 3 4267 3746 1.140
## ItemProb 2 16 3 4198 3746 1.120
## ItemProb 2 17 3 4062 3746 1.080
## ItemProb 2 18 3 4484 3746 1.200
## ItemProb 2 19 2 3995 3746 1.070
## ItemProb 3 1 4 5124 3746 1.370
## ItemProb 3 2 12 16167 3746 4.320
## ItemProb 3 3 6 10102 3746 2.700
## ItemProb 3 4 3 4097 3746 1.090
## ItemProb 3 5 6 6406 3746 1.710
## ItemProb 3 6 10 12624 3746 3.370
## ItemProb 3 7 2 3930 3746 1.050
## ItemProb 3 8 2 3866 3746 1.030
## ItemProb 3 9 10 10206 3746 2.720
## ItemProb 3 10 2 3995 3746 1.070
## ItemProb 3 11 2 3995 3746 1.070
## ItemProb 3 12 4 4728 3746 1.260
## ItemProb 3 13 8 9152 3746 2.440
## ItemProb 3 14 3 4198 3746 1.120
## ItemProb 3 15 2 3995 3746 1.070
## ItemProb 3 16 2 3561 3746 0.951
## ItemProb 3 17 4 4792 3746 1.280
## ItemProb 3 18 2 3866 3746 1.030
## ItemProb 3 19 3 4267 3746 1.140
We can see that the diagnostic shows a quick convergence (low burn-in values), but with a not that good mixing (there are slightly large values of dependence factor of various parameters)
With this analysis in mind, we can try to run another model with better hyperparameters:
fit.GS.tuned=blca(X, 3, method = "gibbs", burn.in = 50, thin = 1/5, iter = 20000)
## Initialising sampler...starting burn-in.
## Burn-in completed...
## 1000 of 20000 samples completed...
## 2000 of 20000 samples completed...
## 3000 of 20000 samples completed...
## 4000 of 20000 samples completed...
## 5000 of 20000 samples completed...
## 6000 of 20000 samples completed...
## 7000 of 20000 samples completed...
## 8000 of 20000 samples completed...
## 9000 of 20000 samples completed...
## 10000 of 20000 samples completed...
## 11000 of 20000 samples completed...
## 12000 of 20000 samples completed...
## 13000 of 20000 samples completed...
## 14000 of 20000 samples completed...
## 15000 of 20000 samples completed...
## 16000 of 20000 samples completed...
## 17000 of 20000 samples completed...
## 18000 of 20000 samples completed...
## 19000 of 20000 samples completed...
## 20000 of 20000 samples completed...
## Sampling run completed.
print(fit.GS.tuned)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.867 0.969 0.772 0.139 0.777
## Group 2 0.921 0.919 0.815 0.727 0.329
## Group 3 0.359 0.302 0.102 0.175 0.463
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.424 0.228 0.111 0.453 0.459
## Group 2 0.765 0.394 0.304 0.664 0.634
## Group 3 0.160 0.033 0.033 0.119 0.214
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.055 0.157 0.942 0.990 0.192
## Group 2 0.094 0.340 0.979 0.984 0.652
## Group 3 0.114 0.110 0.543 0.892 0.160
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.123 0.262 0.087 0.030
## Group 2 0.667 0.682 0.173 0.303
## Group 3 0.033 0.138 0.132 0.058
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.511 0.303 0.186
##
## Posterior Standard Deviation Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.049 0.026 0.063 0.057 0.051
## Group 2 0.038 0.038 0.057 0.073 0.083
## Group 3 0.091 0.127 0.066 0.068 0.104
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.062 0.051 0.038 0.062 0.060
## Group 2 0.072 0.070 0.066 0.067 0.070
## Group 3 0.068 0.032 0.031 0.068 0.071
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.026 0.039 0.028 0.009 0.051
## Group 2 0.041 0.071 0.020 0.017 0.082
## Group 3 0.056 0.055 0.103 0.055 0.066
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.053 0.052 0.031 0.020
## Group 2 0.076 0.074 0.054 0.077
## Group 3 0.032 0.064 0.061 0.041
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.057 0.059 0.047
We can observe now the plots of the posteriors densities for model parameters. For the item probabilities, conditional on class membership and the class probabilities:
par(mfrow=c(3,3))
plot(fit.GS.tuned,which=3)
par(mfrow=c(1,1))
plot(fit.GS.tuned,which=4)
par(mfrow=c(3,3))
plot(fit.GS.tuned,which=5)
We could also try different priors but as the computational cost is to high, I will just put the lines but don’t run it. For the number of K we are going to see only the BIC criteria in order to confirm that the K = 3 gives the best mixture.
## Initialising sampler...starting burn-in.
## Burn-in completed...
## 1000 of 10000 samples completed...
## 2000 of 10000 samples completed...
## 3000 of 10000 samples completed...
## 4000 of 10000 samples completed...
## 5000 of 10000 samples completed...
## 6000 of 10000 samples completed...
## 7000 of 10000 samples completed...
## 8000 of 10000 samples completed...
## 9000 of 10000 samples completed...
## 10000 of 10000 samples completed...
## Sampling run completed.
## Initialising sampler...starting burn-in.
## Burn-in completed...
## 1000 of 20000 samples completed...
## 2000 of 20000 samples completed...
## 3000 of 20000 samples completed...
## 4000 of 20000 samples completed...
## 5000 of 20000 samples completed...
## 6000 of 20000 samples completed...
## 7000 of 20000 samples completed...
## 8000 of 20000 samples completed...
## 9000 of 20000 samples completed...
## 10000 of 20000 samples completed...
## 11000 of 20000 samples completed...
## 12000 of 20000 samples completed...
## 13000 of 20000 samples completed...
## 14000 of 20000 samples completed...
## 15000 of 20000 samples completed...
## 16000 of 20000 samples completed...
## 17000 of 20000 samples completed...
## 18000 of 20000 samples completed...
## 19000 of 20000 samples completed...
## 20000 of 20000 samples completed...
## Sampling run completed.
-fit.GS.tuned$BIC
## [1] 3810.303
-fit.GS.k.2$BIC
## [1] 3812.406
-fit.GS.k.4$BIC
## [1] 3847.216
In this case we can see that K=2 and K=3 are pretty close but although K=2 wins in some runs and K=3 wins in others,we will keep working with K=3 just for getting the three main groups, addicted, non-addicted and don’t care about series.
The idea consists in approximating the posterior distribution with a variational distribution which assumes independence among block of parameters:
where are the variational parameters.
The VB approach looks for the distributions qj that minimize the Kullback-Leibler divergence between the posterior and variational approximation.
In mixture models, it can be shown that the form of qj is the same as that of the conditional posterior distribution.
The variational parameters are updated iteratively until the KL divergence is minimized.
Now, we apply the Variational Bayes (VB) algorithm for the series data:
fit.VB=blca(X, 3, method = "vb")
## Restart number 1, logpost = -1863.01...
print(fit.VB)
##
## MAP Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.859 0.985 0.761 0.109 0.790
## Group 2 0.940 0.931 0.823 0.738 0.341
## Group 3 0.334 0.227 0.057 0.160 0.441
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.402 0.217 0.101 0.441 0.447
## Group 2 0.762 0.380 0.287 0.678 0.635
## Group 3 0.156 0.001 0.001 0.077 0.198
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.044 0.145 0.945 1.000 0.168
## Group 2 0.081 0.322 0.998 1.000 0.644
## Group 3 0.096 0.096 0.521 0.911 0.154
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.093 0.251 0.077 0.021
## Group 2 0.674 0.676 0.157 0.277
## Group 3 0.001 0.104 0.118 0.030
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.516 0.317 0.167
##
## Posterior Standard Deviation Estimates:
##
##
## Item Probabilities:
##
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## Group 1 0.034 0.015 0.041 0.031 0.040
## Group 2 0.032 0.033 0.048 0.054 0.058
## Group 3 0.078 0.071 0.045 0.063 0.082
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## Group 1 0.047 0.040 0.030 0.048 0.048
## Group 2 0.053 0.059 0.056 0.057 0.059
## Group 3 0.063 0.028 0.028 0.050 0.068
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## Group 1 0.022 0.035 0.024 0.009 0.037
## Group 2 0.036 0.057 0.016 0.015 0.059
## Group 3 0.053 0.053 0.083 0.052 0.063
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## Group 1 0.029 0.042 0.027 0.017
## Group 2 0.057 0.057 0.046 0.055
## Group 3 0.028 0.055 0.057 0.038
##
## Membership Probabilities:
##
## Group 1 Group 2 Group 3
## 0.035 0.032 0.026
Observe that the VB method is much more faster than the Gibbs sampling. And VB also provides posterior standard deviation estimates.
fit.VB$itemprob
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.8588870 0.9850253 0.76086159 0.1085302 0.7895723
## [2,] 0.9397215 0.9312647 0.82346703 0.7380368 0.3406571
## [3,] 0.3339084 0.2268142 0.05678021 0.1598690 0.4414827
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.4021508 0.216619346 0.1012336036 0.44082576 0.4471899
## [2,] 0.7619196 0.380476637 0.2873263199 0.67821731 0.6354185
## [3,] 0.1558255 0.001013152 0.0009432598 0.07663585 0.1978054
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.04402172 0.14517401 0.9449849 0.9999985 0.1683718
## [2,] 0.08082083 0.32213651 0.9978032 1.0000000 0.6437150
## [3,] 0.09596543 0.09564748 0.5211017 0.9109057 0.1541671
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.093463816 0.2509851 0.07663344 0.02110870
## [2,] 0.674124467 0.6760970 0.15677631 0.27729064
## [3,] 0.001478075 0.1040733 0.11771631 0.03049472
fit.VB$classprob
## [1] 0.5160120 0.3173026 0.1666854
fit.VB$itemprob.sd
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.03427877 0.01481141 0.04147807 0.03091188 0.03973103
## [2,] 0.03187497 0.03345522 0.04753483 0.05414695 0.05805332
## [3,] 0.07840403 0.07073988 0.04521453 0.06329694 0.08206263
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.04738550 0.04012982 0.03006293 0.04795733 0.04802399
## [2,] 0.05257888 0.05937866 0.05560678 0.05727905 0.05889846
## [3,] 0.06276362 0.02770660 0.02767599 0.04962291 0.06781280
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.02155839 0.03465624 0.02353048 0.009325663 0.03666081
## [2,] 0.03554304 0.05729446 0.01589993 0.014902511 0.05862226
## [3,] 0.05339506 0.05333654 0.08250247 0.052105426 0.06254173
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.02911369 0.04212669 0.02687401 0.01650633
## [2,] 0.05745521 0.05737089 0.04555160 0.05505337
## [3,] 0.02790921 0.05485095 0.05715416 0.03822850
fit.VB$classprob.sd
## [1] 0.03482425 0.03243420 0.02611784
MAP estimates are close to those obtained with the Gibss sampling algorithm:
fit.GS$itemprob
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.8699643 0.9694923 0.7750651 0.1458995 0.7753410
## [2,] 0.9198235 0.9166612 0.8137024 0.7288890 0.3207982
## [3,] 0.3577859 0.3114059 0.1042425 0.1692080 0.4681448
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.4286518 0.23180909 0.11325459 0.4572638 0.4626439
## [2,] 0.7674769 0.39189486 0.30536077 0.6662314 0.6347646
## [3,] 0.1625938 0.03321466 0.03126934 0.1200338 0.2182409
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.05460469 0.1587203 0.9430629 0.9905514 0.1965169
## [2,] 0.09475073 0.3438776 0.9785950 0.9830372 0.6569047
## [3,] 0.11047678 0.1077013 0.5455975 0.8956746 0.1571282
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.12636189 0.2655272 0.08843145 0.03171579
## [2,] 0.67234303 0.6890862 0.17560979 0.30795738
## [3,] 0.03340858 0.1389569 0.12938987 0.05592395
fit.VB$itemprob
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.8588870 0.9850253 0.76086159 0.1085302 0.7895723
## [2,] 0.9397215 0.9312647 0.82346703 0.7380368 0.3406571
## [3,] 0.3339084 0.2268142 0.05678021 0.1598690 0.4414827
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.4021508 0.216619346 0.1012336036 0.44082576 0.4471899
## [2,] 0.7619196 0.380476637 0.2873263199 0.67821731 0.6354185
## [3,] 0.1558255 0.001013152 0.0009432598 0.07663585 0.1978054
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.04402172 0.14517401 0.9449849 0.9999985 0.1683718
## [2,] 0.08082083 0.32213651 0.9978032 1.0000000 0.6437150
## [3,] 0.09596543 0.09564748 0.5211017 0.9109057 0.1541671
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.093463816 0.2509851 0.07663344 0.02110870
## [2,] 0.674124467 0.6760970 0.15677631 0.27729064
## [3,] 0.001478075 0.1040733 0.11771631 0.03049472
fit.GS.tuned$classprob
## [1] 0.5107913 0.3027477 0.1864610
fit.VB$classprob
## [1] 0.5160120 0.3173026 0.1666854
However, the Gibbs sampling provides a better approximation of the posterior distributions. Observe that the posterior standard deviation estimates are larger than those obtained for the VB method, it is cause by the enforced independences between parameters imposed in VB
fit.GS.tuned$itemprob.sd
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.04857874 0.02558162 0.06291761 0.05720351 0.05051660
## [2,] 0.03769376 0.03811155 0.05652233 0.07253795 0.08266426
## [3,] 0.09142739 0.12691456 0.06579413 0.06759151 0.10354081
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.06216102 0.05083708 0.03823216 0.06239750 0.05990135
## [2,] 0.07183806 0.06987098 0.06648555 0.06675373 0.06962007
## [3,] 0.06765567 0.03186953 0.03132139 0.06834100 0.07085235
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.02556842 0.03933352 0.02792014 0.009396706 0.05081652
## [2,] 0.04124470 0.07069300 0.02040096 0.017307536 0.08170084
## [3,] 0.05607453 0.05549105 0.10311566 0.055415126 0.06599695
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.05282257 0.05184842 0.03076281 0.01981604
## [2,] 0.07609116 0.07390731 0.05425701 0.07660064
## [3,] 0.03161159 0.06359193 0.06121339 0.04085680
fit.VB$itemprob.sd
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company
## [1,] 0.03427877 0.01481141 0.04147807 0.03091188 0.03973103
## [2,] 0.03187497 0.03345522 0.04753483 0.05414695 0.05805332
## [3,] 0.07840403 0.07073988 0.04521453 0.06329694 0.08206263
## x5.meet.friends x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films
## [1,] 0.04738550 0.04012982 0.03006293 0.04795733 0.04802399
## [2,] 0.05257888 0.05937866 0.05560678 0.05727905 0.05889846
## [3,] 0.06276362 0.02770660 0.02767599 0.04962291 0.06781280
## x10.cinema x11.smartphone x12.recommend x13.be.recommend x14.late
## [1,] 0.02155839 0.03465624 0.02353048 0.009325663 0.03666081
## [2,] 0.03554304 0.05729446 0.01589993 0.014902511 0.05862226
## [3,] 0.05339506 0.05333654 0.08250247 0.052105426 0.06254173
## x15.procrastinate x16.non.stop x17.too.much x18.addict
## [1,] 0.02911369 0.04212669 0.02687401 0.01650633
## [2,] 0.05745521 0.05737089 0.04555160 0.05505337
## [3,] 0.02790921 0.05485095 0.05715416 0.03822850
fit.GS.tuned$classprob.sd
## [1] 0.05724970 0.05892567 0.04728239
fit.VB$classprob.sd
## [1] 0.03482425 0.03243420 0.02611784
We may also observe these differences in the plots of density estimates for model parameters. Both for the item and class probabilities:
par(mfrow = c(3,3))
plot(fit.GS.tuned,which=3)
par(mfrow = c(3,3))
plot(fit.VB,which=3)
As an extra plot we are going to use the factoextra library in order to make a K-Means clustering (Almost the same as using the bayesian EM). With this library we can get a 2D representation of the different clusters as the algorithm apply as postprocessing step a PCA.
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
set.seed(150)
km.res = kmeans(X, 3, nstart = 40)
fviz_cluster(km.res,X,ellipse = TRUE)
print(km.res)
## K-means clustering with 3 clusters of sizes 53, 85, 64
##
## Cluster means:
## x0.Recent x1.Subscription x2.Shared x3.Pirate x4.in.company x5.meet.friends
## 1 0.4150943 0.4716981 0.05660377 0.1509434 0.5471698 0.1320755
## 2 0.9294118 0.9882353 0.91764706 0.1058824 0.8117647 0.4941176
## 3 0.9375000 0.9531250 0.82812500 0.7343750 0.3281250 0.7343750
## x6.day.hrs x7.week.hrs x8.trends x9.series.vs.films x10.cinema x11.smartphone
## 1 0.03773585 0.03773585 0.1698113 0.2075472 0.07547170 0.0754717
## 2 0.25882353 0.11764706 0.4705882 0.4823529 0.04705882 0.1764706
## 3 0.35937500 0.26562500 0.6718750 0.6562500 0.07812500 0.3125000
## x12.recommend x13.be.recommend x14.late x15.procrastinate x16.non.stop
## 1 0.6792453 0.9433962 0.1132075 0.03773585 0.1320755
## 2 0.9411765 1.0000000 0.2000000 0.04705882 0.2235294
## 3 1.0000000 1.0000000 0.6406250 0.73437500 0.7343750
## x17.too.much x18.addict
## 1 0.07547170 0.03773585
## 2 0.08235294 0.02352941
## 3 0.17187500 0.26562500
##
## Clustering vector:
## [1] 3 3 3 3 3 3 3 3 2 3 1 1 3 3 2 2 1 3 1 2 1 3 2 2 2 1 2 2 2 2 1 1 3 1 1 2 3
## [38] 3 1 2 3 1 2 2 3 1 1 2 2 2 2 1 2 1 1 3 1 2 1 1 2 2 3 1 2 2 2 1 2 2 2 3 3 1
## [75] 1 3 1 2 3 1 2 1 2 3 2 2 2 2 1 2 1 1 3 3 2 2 2 3 2 1 2 2 3 2 2 3 2 2 2 2 1
## [112] 2 2 1 2 1 3 1 3 2 3 2 1 1 3 2 3 2 1 2 3 1 1 2 3 3 3 1 3 2 1 3 2 2 3 3 2 3
## [149] 3 3 2 1 1 3 3 1 2 3 2 2 3 2 3 3 3 2 3 2 3 3 1 1 2 2 2 3 1 1 2 3 2 2 2 3 2
## [186] 3 3 2 1 2 3 1 1 3 2 1 2 2 2 2 2 3
##
## Within cluster sum of squares by cluster:
## [1] 115.6604 184.2118 190.2188
## (between_SS / total_SS = 23.8 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
Now that we have all the models implemented, we are going to plot the clusters results for each model according to the different ages and gender.
cluster1 = data[which(km.res$cluster == 1),]
cluster2 = data[which(km.res$cluster == 2),]
cluster3 = data[which(km.res$cluster == 3),]
par(mfrow=c(1,3))
pie(table(cluster1["Age"]),main="Cluster 1 K-means",radius=1)
pie(table(cluster2["Age"]),main="Cluster 2 K-means",radius=1)
pie(table(cluster3["Age"]),main="Cluster 3 K-means",radius=1)
par(mfrow=c(1,2))
a = dim(cluster1[which(cluster1$Age=="16-21"),])[1]
b = dim(cluster1[which(cluster1$Age=="22-30"),])[1]
c = dim(cluster2[which(cluster2$Age=="16-21"),])[1]
d = dim(cluster2[which(cluster2$Age=="22-30"),])[1]
e = dim(cluster3[which(cluster3$Age=="16-21"),])[1]
f = dim(cluster3[which(cluster3$Age=="22-30"),])[1]
total = a+b+c+d+e+f
cluster1_prop = (a+b)/total
cluster2_prop = (c+d)/total
cluster3_prop = (e+f)/total
pie(x=c(cluster1_prop,cluster2_prop,cluster3_prop),main="Ages 16-30 Cluster Prop K-means",radius=1,labels = c("C1","C2","C3"))
a = dim(cluster1[which(cluster1$Age=="31-49"),])[1]
b = dim(cluster1[which(cluster1$Age=="50-70"),])[1]
c = dim(cluster2[which(cluster2$Age=="31-49"),])[1]
d = dim(cluster2[which(cluster2$Age=="50-70"),])[1]
e = dim(cluster3[which(cluster3$Age=="31-49"),])[1]
f = dim(cluster3[which(cluster3$Age=="50-70"),])[1]
total = a+b+c+d+e+f
cluster1_prop = (a+b)/total
cluster2_prop = (c+d)/total
cluster3_prop = (e+f)/total
pie(x=c(cluster1_prop,cluster2_prop,cluster3_prop),main="Ages 31-70 Cluster Prop K-means",radius=1,labels = c("C1","C2","C3"))
par(mfrow=c(1,3))
pie(table(cluster1["Gender"]),main="Cluster 1 K-means",radius=1)
pie(table(cluster2["Gender"]),main="Cluster 2 K-means",radius=1)
pie(table(cluster3["Gender"]),main="Cluster 3 K-means",radius=1)
Note that the following algorithms doesn’t need to have the same cluster order, that means, cluster 1 in the K-means could be cluster 3 on the GS. Anyway, it should be easy to compare the same classes as the shapes are going to be more or less the same.
ClusterProbEM = as.data.frame(Zscore(X,fit=fit.EM))
colnames(ClusterProbEM) = c(1,2,3)
ClusterNumbersEM = as.integer(colnames(ClusterProbEM)[apply(ClusterProbEM,1,which.max)])
ClusterProbGS = as.data.frame(Zscore(X,fit=fit.GS.tuned))
colnames(ClusterProbGS) = c(1,2,3)
ClusterNumbersGS = as.integer(colnames(ClusterProbGS)[apply(ClusterProbGS,1,which.max)])
ClusterProbVB = as.data.frame(Zscore(X,fit=fit.VB))
colnames(ClusterProbVB) = c(1,2,3)
ClusterNumbersVB = as.integer(colnames(ClusterProbVB)[apply(ClusterProbVB,1,which.max)])
## [1] "Male respondants: 72"
## [1] "Female respondants: 126"
As we can see, the gender has nothing to do with the clusters but young people tend more to by addicted. Also, the results between different models are not such a difference, the higher differences are found between frequentist and bayesian but are also similar. Due to the computational costs, the frequentist K-means and the VB will be the models to choose.
We have implemented a Bayesian approach for clustering of different profiles of series watchers according to their behavior.
We have implemented three different approaches of bayesian methods, EM, MCMC(Gibbs sampling) and VB
The frequentist K-means also offers good result that are a little bit different from the Bayesian ones.
The VB is the fastest approach but tend to diminish variance estimates.
Gibbs sampling is very time consuming but offers a better approximation.
Young people tend to be more in the addicted to series profile